[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. #155517

charithaintc · 2025-08-26T23:26:57Z

This PR adds the features needed for supporting the GEMM with transpose B case.

Summary of changes.

1). Add distribution logic for vector.bitcast, vector.transpose and memref.extract_aligned_pointer_as_index cases.
2). Add layout propagation support for vector.shape_cast, vector.broadcast and vector.bitcast
3). Incorporate slice attribute and DistributeLayoutAttr interface with the core logic in layout prop.

…r_and_SliceAttr' into vector_bitcast_distr

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

Jianhui-Li · 2025-09-11T23:40:39Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

+  // communication. So each lane must own the required number of elements to
+  // perform the bitcast locally without cross-lane communication.
+  int outInnerBitsPerLane = outData[rank - 1] * outElemTyBitWidth;
+  if (outInnerBitsPerLane < inElemTyBitWidth) {


check the condition
srcInnerBitsPerLane = inElemTypeBitWidth x sourceLayout.getLaneData
if (outInnerBitsPerLane != srcInnerBitsPerLane)

I thought about this again. sourceLayout.getLaneData is not available to us because we are trying to decide this here. I think we can only detect narrowing case only.

Widening case will always be valid because at this point if result already have a valid layout. Otherwise it means that result was not assigned a correct layout. That must be concern of the layout conflict maybe.

In any case, I added a check to verify if the result layout is valid and can be distributed to lanes.

I see. I would move the check after the sourceLaneData is assigned. See comments below also.

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

Jianhui-Li · 2025-09-11T23:49:30Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

+    shapeCast.emitWarning("Expecting result type to be 1D or 2D vector.");
+    return;
+  }
+  // For 2D -> 2D shape cast, propagate the result layout to the source.


consider the restriction for now:

same rank shape cast not allowed,

always expand the dim not squeeze the dim,

The new dims must be 1, and the original dims must not change

fixed I also added this condition for now.

Result layout can not be a slice layout and it must have same rank as result.

adam-smnk

Usually smaller PRs make reviews go faster but I'll bite 😉

Overall logic looks good, only minor comments.

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

Jianhui-Li · 2025-09-15T18:29:48Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

  for (int64_t idx : permutation) {
-    newLayout.layout.push_back(laneLayout.layout[idx]);
-    newData.layout.push_back(laneData.layout[idx]);
+    laneLayout.push_back(static_cast<int32_t>(getLaneLayout()[idx]));


how about add one more utilit to layout attribute, like getTransposedLayout(), so that it can be reused by sg_layout, or lane_layout.
Potentially, the isTransposeOf can be simplified to doing a transpose of input and compare whether they are same?

agree. I will add this in a separate PR and clean up.

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

Jianhui-Li · 2025-09-15T18:57:39Z

mlir/test/Dialect/XeGPU/propagate-layout.mlir

+func.func @vector_shape_cast_2d_to_1d_dim0_distributed(%arg0: !xegpu.tensor_desc<16x1xf16>, %arg1: !xegpu.tensor_desc<16xf16>) {
+  %c0 = arith.constant 0 : index
+  %3 = xegpu.load_nd %arg0  : !xegpu.tensor_desc<16x1xf16> -> vector<16x1xf16>
+  %2 = vector.shape_cast %3 : vector<16x1xf16> to vector<16xf16>


It seems contradict with the documentation
2) Shape cast must always expand the rank (e.g. 1D -> 2D).

and the code
https://github.com/llvm/llvm-project/pull/155517/files#diff-fcc9cdbf8bb4e5d37e661524b877082aee9b7badb0317f980c1881da564a926dR536

Not sure why the code is passing. Maybe I missed something?

sorry. I forgot to remove this test (CI was failing because of it). I removed this tests now.

Shape cast must always expand the rank (e.g. 1D -> 2D).

If you refer to vector.shape_cast, a cast must preserve the same number of elements. Shape's rank can be freely changed up or down.

The two cases looked valid, it'd be good to understand why they failed.
If they can't be distributed, I'd leave them in as negative examples.

@adam-smnk The restriction is there because we do not expect (for now) any narrowing shape casts. Shape cast is currently used to make the vector 2D after a 2D -> 1D reduction.

Adding back the tests as negative examples for now.

my bad. pass is designed to fail if we can not assign a proper layout to ops. So I can not add the negative example in the same file AFAIK.

Hmm, then it's sth rethink if it impacts testing.
A separate test file would be fine as this one's already pretty large. Not sure if verify-diagnostics can also test pass failures. TBD

Not sure if verify-diagnostics can also test pass failures.

I think it can. challenge is doing it in same file. I did not find any examples. But I will give a try.

Jianhui-Li · 2025-09-15T20:14:14Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

+    return;
+  }
+  // Decide lane data based on whether the bitcast is narrowing or widening.
+  int64_t innerMostLaneData = isNarrowing ? outData[rank - 1] / bitCastRatio


For narrowing bitcast, innerMostLaneData = outData[rank - 1] * bitCastRatio, instead of / bitCastRatio?
Put a TODO here?: check the layout conflict case here if ( innerMostLaneData * inElemTyBitWidth != outInnerBitsPerLane ).

For narrowing bitcast, innerMostLaneData = outData[rank - 1] * bitCastRatio, instead of / bitCastRatio?

This is because in narrowing case source had higher bitwidth (e.g f32 -> f16)

Put a TODO here?: check the layout conflict case here if ( innerMostLaneData * inElemTyBitWidth != outInnerBitsPerLane ).

This is not required. At this point of layout propagation result layout is already a valid layout. We chose innerMostLaneData such that innerMostLaneData * inElemTyBitWidth == outInnerBitsPerLane.

Jianhui-Li · 2025-09-15T20:26:54Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp

+  // communication. So each lane must own the required number of elements to
+  // perform the bitcast locally without cross-lane communication.
+  int outInnerBitsPerLane = outData[rank - 1] * outElemTyBitWidth;
+  if (outInnerBitsPerLane < inElemTyBitWidth) {


I see. I would move the check after the sourceLaneData is assigned. See comments below also.

charithaintc · 2025-09-18T23:34:40Z

@adam-smnk Can you take another look and/or approve? :-)

Jianhui-Li

LGTM

charithaintc and others added 15 commits August 21, 2025 16:11

pull changes

28c5c4c

rename getLayoutAttr util

ad5d0a8

refine

0e34f36

format

a84014f

update convert_layout

f3af2c3

save work

ee5baca

save work

621122c

fix compilation error in clang

35c6489

save work

b912c21

Merge remote-tracking branch 'chencha3/generalize_utils_for_LayoutAtt…

42a2309

…r_and_SliceAttr' into vector_bitcast_distr

save work

5a683b4

save work

2da2c6d

save work

7eabad4

Merge branch 'main' into vector_bitcast_distr

237637c

save work

635a006

charithaintc requested review from Jianhui-Li, adam-smnk, chencha3 and silee2 and removed request for chencha3 August 26, 2025 23:27

charithaintc changed the title ~~[mlir][xegpu] Add SIMT distribution support GEMM transpose B case.~~ [mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. Aug 26, 2025

charithaintc added 8 commits August 26, 2025 23:38

save work

74ab5a3

save work

b36e109

save work

4e871d7

Merge branch 'main' into vector_bitcast_distr

4119eb1

fix

6bf4c68

fix

1a1ef32

fix

34f1703

fix

d7169de

charithaintc added 5 commits September 11, 2025 20:27

Merge branch 'main' into vector_bitcast_distr

441323c

Merge branch 'slice_utils' into vector_bitcast_distr

d0de396

use isTransposeOf

dc3a250

cleanup

2f83417

cleanup

0819489

Jianhui-Li reviewed Sep 11, 2025

View reviewed changes

charithaintc added 4 commits September 12, 2025 18:56

Merge branch 'main' into vector_bitcast_distr

0c1f71a

address comments

90b6d8e

address comments

9c2a7ed

address comments

d55bce8

adam-smnk reviewed Sep 15, 2025

View reviewed changes

Jianhui-Li reviewed Sep 15, 2025

View reviewed changes

charithaintc added 2 commits September 15, 2025 21:37

Merge branch 'main' into vector_bitcast_distr

96a9da2

remove invalid shape cast tests

74df1be

charithaintc mentioned this pull request Sep 17, 2025

[mlir][xegpu] Add utilities for xegpu::sliceAttr #157970

Closed

charithaintc added 5 commits September 17, 2025 23:03

Merge branch 'main' into vector_bitcast_distr

c877d93

address comments

d1ca356

Merge branch 'main' into vector_bitcast_distr

b1e5ee3

address comments

80e930f

simplify shape cast handling

b1bb16b

Jianhui-Li approved these changes Sep 18, 2025

View reviewed changes

adam-smnk approved these changes Sep 19, 2025

View reviewed changes

charithaintc and others added 3 commits September 19, 2025 16:15

Merge branch 'main' into vector_bitcast_distr

073686e

remove headers

1376ca2

Merge branch 'main' into vector_bitcast_distr

9ec78f9

charithaintc merged commit 2998c74 into llvm:main Sep 19, 2025
5 checks passed

[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. #155517

[mlir][xegpu] Add SIMT distribution support for GEMM transpose B case. #155517

Uh oh!

Conversation

charithaintc commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam-smnk Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charithaintc commented Sep 18, 2025

Uh oh!

Jianhui-Li left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

charithaintc commented Aug 26, 2025 •

edited

Loading

adam-smnk Sep 16, 2025 •

edited

Loading